Learning with Confident Examples: Rank Pruning for Robust Classification with Noisy Labels

نویسندگان

  • Curtis G. Northcutt
  • Tailin Wu
  • Isaac L. Chuang
چکیده

P̃ Ñ learning is the problem of binary classification when training examples may be mislabeled (flipped) uniformly with noise rate ρ1 for positive examples and ρ0 for negative examples. We propose Rank Pruning (RP) to solve P̃ Ñ learning and the open problem of estimating the noise rates. Unlike prior solutions, RP is efficient and general, requiring O(T ) for any unrestricted choice of probabilistic classifier with T fitting time. We prove RP achieves consistent noise estimation and equivalent expected risk as learning with uncorrupted labels in ideal conditions, and derive closed-form solutions when conditions are non-ideal. RP achieves state-of-the-art noise rate estimation and F1, error, and AUC-PR on the MNIST and CIFAR datasets, regardless of noise rates. To highlight, RP with a CNN classifier can predict if a MNIST digit is a one or not with only 0.25% error, and 0.46% error across all digits, even when 50% of positive examples are mislabeled and 50% of observed positive labels are mislabeled negative examples.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An Information-Theoretic Discussion of Convolutional Bottleneck Features for Robust Speech Recognition

Convolutional Neural Networks (CNNs) have been shown their performance in speech recognition systems for extracting features, and also acoustic modeling. In addition, CNNs have been used for robust speech recognition and competitive results have been reported. Convolutive Bottleneck Network (CBN) is a kind of CNNs which has a bottleneck layer among its fully connected layers. The bottleneck fea...

متن کامل

A Learning to Rank from Noisy Data

Learning to Rank, which learns the ranking function from training data, has become an emerging research area in information retrieval and machine learning. Most existing work on learning to rank assumes that the training data is clean, which is, however, not always true. The ambiguity of query intent, the lack of domain knowledge, and the vague definition of relevance levels, all make it diffic...

متن کامل

A Self-Organizing Neural Network that Learns to Detect and Represent Visual Depth from Occlusion Events

nus paper discusses issues in noise tolerant learning from sensory data. A model driven approach to symbolic learning from noisy data is suggested. Introduction Sensor-driven characteristics of visual objects are rarely noise free and most often quite noisy. "The visual world is noisy. Even well posed visual computations are often numerically unstable, if noise is present in both the scene and ...

متن کامل

Robust Loss Functions under Label Noise for Deep Neural Networks

In many applications of classifier learning, training data suffers from label noise. Deep networks are learned using huge training data where the problem of noisy labels is particularly relevant. The current techniques proposed for learning deep networks under label noise focus on modifying the network architecture and on algorithms for estimating true labels from noisy labels. An alternate app...

متن کامل

Learning Transformations for Clustering and Classification Learning Transformations for Clustering and Classification

A low-rank transformation learning framework for subspace clustering and classification is here proposed. Many high-dimensional data, such as face images and motion sequences, approximately lie in a union of low-dimensional subspaces. The corresponding subspace clustering problem has been extensively studied in the literature to partition such highdimensional data into clusters corresponding to...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • CoRR

دوره abs/1705.01936  شماره 

صفحات  -

تاریخ انتشار 2017